Archiving with SAM-FS is different from the normal archiving process which removes both the data and its pointers (metadata) from on-line storage. With SAM-FS, the archiving operation copies the data from on-line disk storage to one or more removable media volumes, leaving the original data and its metadata also on disk storage.
Releasing frees up on-line disk space by removing data that has been archived. Although the data has been removed, the metadata remains on-line. To the user the data appears to remain on-line.
When a released file is accessed, SAM-FS automatically stages, or restores, the file to disk cache. For a sequential read of an off-line file, the read operation tracks directly behind the staging operation, enabling the user to start working with the file before the entire file is staged.
As users modify files, archive copies associated with old versions should be purged. The Recycler identifies removable media volumes with a large proportion of expired archive copies, and rearchives the useful data to different volumes. Once the useful data has been rearchived, the volume can be relabeled for reuse, or, if an historical record of file changes is required, the recycled volume can be moved to long-term off-site storage.
A UNIX file systems (UFS) may not normally span more than one physical device. SAM-FS enables file system data to be written across multiple disk partitions through software disk striping.
Another limit associated with a traditional UNIX file system is the total number of files that can be cataloged in the file system. Since the file information (the inodes) are dynamically allocated, the number of files is dictated by the amount of mass storage.
When a file system is mounted, the storage family set is specified as the mount device, not the disk partitions.
To ensure that blocks allocated to a given file are in close proximity of each other, forward allocation is used. When blocks are assigned to a given file it is very probable that the small blocks will be contiguous in the cylinder. Since forward allocation applies to all blocks allocated, the next block will have a high probability of being near the current block. This prevents the overall scattering of a given file's data blocks which results from the immediate use of blocks from deleted files. With forward allocation, blocks from deleted files will be eventually reassigned as a group to new files.
SAM-FS operates as a file system under Solaris 2, and can utilize the mirroring capability provided through Sun's DiskSuite.
SAM-FS labels all removable media with ANSI standard volume labels. Each piece of media must be labeled with a unique volume serial name (VSN). SAM-FS writes all archived data using tar format. The use of tar format preserves the original filename, owner and group in effect at the time the file was archived.
SAM-FS supports the standard UNIX file types of regular file, directory file and symbolic link. In addition, SAM-FS has a new file type, designated as a removable media file, which allows users to access data stored on removable media without knowing its physical location. A removable media file contains the access and position information that identifies the media and file resident on that volume.
When a user opens a removable media file, SAM-FS requests that the media be mounted if it is not already mounted. Information written to or read from the file is transparently transferred to or from the appropriate physical device.
To ensure that files are complete before archiving, the Archiver allows files to age for a period of time (archive age) before archiving the file.
SAM-FS provides for archive sets, which are groupings of files that match criteria such as minimum size, maximum size, owner, group or directory location. Each archive set is associated with a collection of removable media. Archive sets control the destination of the archive copy, wait time before the file is archived, and the length of time to retain the archive copy.
In addition to the default archive time set by the system administrator, the user has the option to:
The user has the option to:
The user is given the flexibility to:
Staging of the initial file proceeds normally, however, the other files with the Associative Staging attribute enabled are also staged; therefore, when the user requests them, they are immediately available. Associative Staging not only reduces manual intervention by the user and speeds the access to related files, it also significantly reduces robot motion and media shuffling.
Direct Access and Pre-Staging -- To facilitate the efficient use of on-line storage and provide quick access to near-line data, a file can be marked with the "never stage" attribute. This means the file will be accessed directly from the removable media. No stage is done and the file remains off-line. Direct access allows large near-line databases to be efficiently accessed. Direct access is supported on files which are resident on tape or optical disk.
For applications which need to access large portions of the data, the file can be pre-staged to disk. Although not a requirement on the part of the user, pre-staging may provide improved usage of device resources by allowing stage requests for files resident on the same archive media to be batched together.
Backup systems make a snapshot of the current state of the file system. Recovery of a file (usually due to loss) involves an extraction process which copies the file from the backup media on to on-line storage. SAM-FS provides a backup utility samfsdump for backing up metadata.
With SAM-FS the recovery from a disk failure is very quick, usually a matter of minutes. All that is needed to recover is to reload the metadata on a new disk using the samfsrestore utility. The archived data on removable media is complete and does not have to be staged into the new disk.
If the file is not archived, the data resident on the magnetic disk is lost. The file is marked as "damaged." A damaged file notifies end users that their file is unusable and should be recreated. A damaged file can only be removed. Detection of damage at the time the incident occurs significantly increases the chances that the lost data can be recovered, reconstructed or regenerated by other means.
sam-init also controls the archive process, archiver; the release process, releaser; the associative staging process, stageall; the recycling process, recycler; and the remote request server, rpc.sam.
Scanner -- SAM-FS matches the presence of labeled media with that of requested media and automatically connects the user to the physical media during the first I/O operation. All manually-loaded peripheral devices are monitored by the device scanner, as is the media request queue. When the scanner observes the presence of requested media on a particular device (depending upon access restrictions that are currently imposed) the device scanner instructs SAM-FS to connect the device with the open file that is requesting the media. At this point the job is awakened and processing continues. The device remains assigned until the file is closed.
Robots -- The robotics manager manages requests for files, maintains an inventory of media in the robotic device and manages the movement of media between storage slots and drives. Robotic devices are considered to be family set devices. A jukebox and the drives contained within the jukebox are its members. The robotics manager monitors the request queue, and schedules the automatic mounting of requested media including the issuing of robotic motion-control instructions to the device. Once the media is present on one of the member drives, the robotics manager detects it and completes the connection between user and device. Upon completion, the device retains the media for future use, until files on another media are requested.
Archiver -- The Archiver selects files to be copied onto their targeted removable media. All archived files are recorded in tar format to ensure data compatibility and mobility with other Solaris and non-Solaris systems.
Each file system is logically organized into archive sets. The archive sets allow grouping files together for copying to removable media. There is no software limit imposed on the number of archive sets.
Releaser -- The releaser attempts to release the disk space of archived files on a file system until the low-water mark is reached. The releaser is started automatically when the disk cache reaches the high-water mark. A weighting factor is used to determine which files should be released first.
The releaser builds an ordered list of the files that have been archived. The position of a file in the list is determined by a priority based on file size and the age since last accessed or modified. Starting at the top of the list, the disk space used by each file is released until the low-water mark has been reached. If the list is exhausted before the low-water mark is reached, the process is repeated.
Recycler -- When a jukebox exceeds its high threshold, the Recycler searches for media with small amounts of useful data, copies the useful data to other media and recycles the original medium using the same VSN, freeing up the VSN for reuse.
The available tools are samtool, robottool, devicetool and previewtool. samtool acts as a launcher program, providing a simple interface for starting the other tools; it also contains on-line help, displaying a brief description of each of the tools. devicetool displays information about and manages individual devices. Mount requests are displayed and can be cleared using previewtool. robottool displays information about and manages robot devices.
robottool presents a graphical user interface for viewing information about and managing the robot devices associated with SAM-FS. When a robot is selected from the robot list, the VSN catalog and devices associated with that robot are displayed, and the buttons for the commands appropriate to the selected robot and its state become active. The possible actions are Full Audit, Change State, Import Media, and Unload. Selecting Full Audit causes an audit of every VSN in the robot to be performed. Change state is an abbreviated menu button used to change the state of the robot. Possible states are on, idle, off, and down. Import Media tells the robot to take in the piece of media that is placed in its mailbox. Selecting Unload causes the robot to unload the piece of media that is currently in the device.
devicetool presents a graphical user interface for viewing information about and managing individual devices associated with SAM-FS. When a device is selected, the buttons for actions appropriate for that device type are activated below the display. Possible actions are Change State, Unload, Audit, and Label.
previewtool is a graphical user interface for viewing and managing pending mount requests.
A curses-based command, samu, is provided for displays that do not support X. Functionality of each of the GUIs is incorporated into samu.
archive_audit -- generate an audit of all archived files by media type and VSN.
export -- export media from a robot.
import -- import media from a robot.
itemize -- generate a list of files cataloged on a given optical disk or tape and/or generate a list of optical disks or tapes cataloged in a robot.
odlabel -- label optical media.
tplabel -- label tape media.
samdev -- add /dev/samst entries for media libraries, optical disk and tape drives attached to the system.
samfsdump -- dump file control structure data.
samfsrestore -- restore file control structure data.
sammkfs -- construct a new samfs file system.
samu -- execute SAM-FS operator utility.
unarchive -- delete archive entries.
archive -- copy files from magnetic disk cache to removable media.
release -- release the disk space of archived files.
request -- create a removable media file.
sfind -- search for files in a directory hierarchy.
sls -- list directory contents.
stage -- stage archived files from removable media to magnetic disk cache.
When a request is made, the process or program making the request is the client process or program, running on the client machine. The requests are received and processed by the server, running on the server, or host machine. For SAM-FS the server machine is always the machine where SAM-FS is running.
Two API libraries are available with SAM-FS: libsam and libsamrpc. The library calls in libsam do not do network communications; they make local requests only. Each library call makes a system call, and the server is the local operating system.
The library calls in libsamrpc use Remote Procedure Calls (RPC) to communicate with a special server process. Because of the RPC mechanism, the client and server can exist on the same machine or on differing machines in the network. The server process always runs on the machine where SAM-FS is running.
Archive Storage -- Copies of file data that have been created on removable media for the purpose of long-term off-line storage.
Backup Storage -- A snapshot of a collection of files for the express purpose of preventing inadvertent loss. A backup includes both the file's attributes as well as the file's associated data.
DAU -- Disk Allocation Unit. A basic unit of on-line storage. The SAM-FS uses two sizes; a small (4096 bytes) and a large (16384 bytes).
Device Scanner -- A function of SAM-FS that periodically monitors the presence of all SAM-FS manually-mounted removable devices and detects the presence of mounted media that may be requested by a user of other process.
devicetool -- SAM-FS administrative tool with graphical user interface for viewing information about and managing individual devices.
Disk Striping -- The process of recording a file across several disks, thereby improving access performance and increasing overall storage capacity.
Direct Access -- A file attribute that designates that a near-line file can be accessed directly from the archive media and need not be staged on-line for access.
Directory -- A file data structure pointing to other files and directories within the file system.
Thresholds -- Thresholds that define the desirable available storage window for on-line storage. These thresholds instruct and set the storage goals for the releaser.
Family Set -- A storage set that is represented by a group of independent devices such as a collection of disks, or the drives mounted within a jukebox device.
File System -- A hierarchical collection of files and directories. The file system is a common contact point for the user and influences the user's view of the operating system.
FTP -- File Transfer Protocol. An Internet protocol for transferring files between two hosts over a TCP/IP based network.
Inode -- Index Node. A data structure used by the file system to describe a file. An inode describes all the attributes associated with a file such as ownership, and where the file is allocated on the disk system.
Inode File -- A special file on the SAM-FS that contains the inode structures for all files resident in the file system.
Jukebox -- A robotically-controlled device designed to automatically load and unload removable media without operator intervention.
Kernel -- The central controlling program that provides basic system facilities. The UNIX kernel creates and manages processes, provides functions to access the file system, provides general security, and supplies communication facilities.
MCF -- Master Configuration File. A file that is read at initialization time and defines the device topology for a SAM-FS server.
Media Recycling -- The process of recycling or reusing archive media with low utilization. This is media with few active files.
Name Space -- The portion of a collection of files that identifies the file, its attributes and its storage locations.
Near-line Storage -- Storage that is removable and requires robotic-mounting before it can be accessed. Near-line storage is usually less expensive than on-line storage but requires a somewhat longer access time.
NFS -- Network File System. A file system/protocol which allows a UNIX file system to be remotely mounted via a network.
Off-line Storage -- Storage that requires operator intervention for mounting.
Off-site Storage -- Storage that is located remote from the primary storage facility and is used for disaster recovery planning.
On-line Storage -- Storage that is immediately available, such as disk storage.
Optical Disk -- A removable storage medium that is written and read with laser beams.
previewtool -- SAM-FS administrative tool with graphical user interface for viewing and managing pending mount requests.
Releaser -- The disk space releaser program, which automatically controls the amount of on-line disk storage to high and low thresholds.
Removable Media File -- A special type of user file used to access removable media such as magnetic tape or optical disk.
Robotic Device -- A robotically-controlled device designed to automatically load and unload removable media without operator intervention.
robottool -- SAM-FS administrative tool with graphical user interface for viewing and managing robot devices.
RPC -- Remote Procedure Calls. The underlying data exchange mechanism used by NFS. It can be used to implement any custom network data server.
samfsdump -- samfsdump creates a control structure dump. samfsdump copies all the control structure information for a given group of files. It is analogous to the UNIX tar utility, but it does not copy any data.
samfsrestore -- samfsrestore restores a control structure dump.
samtool -- SAM-FS administrative tool with graphical user interface for starting robottool, devicetool and previewtool.
SCSI -- Small Computer System Interface. An electrical and communication specification commonly used for peripheral devices.
Staging -- The process of copying a near- or off-line file from its archive storage back onto on-line storage.
Storage Family Set -- A set of disks that are collectively represented by a single disk family device.
tar -- Tape ARchive. A file/data recording format used by SAM-FS for archive images.
TCP/IP -- Transmission Control Protocol/Internet Protocol. The Internet protocols responsible for host-to-host addressing and routing and packet delivery (IP), and reliable delivery of data between application points (TCP).
SAM-FS -- The LSC Storage and Archive Manager File System. SAM-FS controls the access to all files stored and all devices configured in the MCF.
VSN -- Volume Serial Name. A logical identifier for magnetic tape and optical disk that is written in the volume label.
WORM -- Write Once Read Many. A storage classification for media that can be written only once, but can be read many times.
Return to LSC's Home Page
(C)1994, 1996 LSC, Inc. All rights reserved.
Storage and Archiving Manager (SAM-FS) is a
trademark of LSC, Inc. All other trademarks are the property of
their respective owners.